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Abstract 

A way to optimize performance of relational row store databases is to 
reduce the row widths by vertically partitioning tables into table fractions 
in order to minimize the number of irrelevant columns/attributes read by 
each transaction. This paper considers vertical partitioning algorithms 
for relational row-store OLTP databases with an H-store-like architec- 
ture, meaning that we would like to maximize the number of single-sited 
transactions. We present a model for the vertical partitioning problem 
that, given a schema together with a vertical partitioning and a workload, 
estimates the costs (bytes read/written by storage layer access methods 
and bytes transferred between sites) of evaluating the workload on the 
given partitioning. The cost model allows for arbitrarily prioritizing load 
balancing of sites vs. total cost minimization. We show that finding a 
minimum-cost vertical partitioning in this model is NP-hard and present 
two algorithms returning solutions in which single-sitedness of read queries 
is preserved while allowing column replication (which may allow a drasti- 
cally reduced cost compared to disjoint partitioning). The first algorithm 
is a quadratic integer program that finds optimal minimum-cost solutions 
with respect to the model, and the second algorithm is a more scalable 
heuristic based on simulated annealing. Experiments show that the algo- 
rithms can reduce the cost of the model objective by 37% when applied 
to the TPC-C benchmark and the heuristic is shown to obtain solutions 
with cost close to the ones found using the quadratic program. 

1 Introduction 



In this paper we consider OLTP databases with an H-store [? ] like architecture 
in which we would aim for maximizing the number of single-sited transactions 
(i.e. transactions that can be run to completion on a single site). Given a 
database schema and a workload we would like to reduce the cost of evaluating 
the workload. In row-stores, where each row is stored as a contiguous segment 
and access is done in quantums of whole rows, a significant amount of super- 
fluous columns/attributes (we will use the term attribute in the following) are 
likely to be accessed during evaluation of a workload. It is easy to see that this 
superfluous data access may have a negative impact on performance so in an 
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optimal world the amount of data accessed by each query should be minimized. 
One approach to this is to perform a vertical partitioning of the tables in the 
schema. A vertical partitioning is a, possibly non-disjoint, distribution of at- 
tributes and transactions onto multiple physical or logical sites. (Notice, that 
vertical and horizontal partitioning are not mutually exclusive and can perfectly 
be used together) . The optimality of a vertical partitioning depends on the con- 
text: OLAP applications with lots of many- row aggregates will likely benefit 
from parallelizing the transactions on multiple sites and exchanging small sub- 
results between the sites after the aggregations. OLTP applications on the other 
hand, with many short-lived transactions, no many-row aggregates and with few 
or no few-row aggregates would likely benefit from gathering all attributes read 
by a query locally on the same site: inter-site transfers and the synchronization 
mechanisms needed for non-single-sited or parallel queries (e.g. undo and redo 
logs) are assumed to be bottlenecks in situations with short transaction dura- 
tions. ? ] and ? ] discuss the benefits of single-sitedness in high-throughput 
OLTP databases in more details. 

This paper presents a cost model together with two algorithms that find 
either optimal or close-to-optimal vertical partitionings with respect to the cost 
model. The two algorithms are based on quadratic programming and simulated 
annealing, respectively. For a given partitioning and a workload, the cost model 
estimates the number of bytes read/ written by access methods in the storage 
layer and the amount of data transfer between sites. Our model is made with a 
specific setting in mind, captured by five headlines: 

OLTP The database is a transaction processing system with many short lived 
transactions. 

Aggregates No many-row aggregates and few (or no) aggregates on small row- 
subsets. 

Preserve single-sitedness We should try to avoid breaking single-sitedness 
as a large number of single-sited transactions will reduce the need for 
inter-site transfers and completely eliminate the need for undo and redo 
logs for these queries if the partitioning is performed on an H-store like 
DMBS [? ]. 

Workload known Transactions used in the workload together with some run- 
time statistics are assumed to be known when applying the algorithms. 

Furthermore, following the consensus in the related work (see Section 11.31) we 
simplify the model by not considering time spent on network latency (if all 
vertical partitions are placed locally on a single site, then time spend on network 
latency is trivially zero anyway). A description of how to include latency in the 
model at the expense of increased complexity can be found in Appendix |XJ 

1.1 Outline of approach 

The basic idea is as follows. We are given an input in form of a schema together 
with a workload in which queries are grouped into transactions, and each query 
is described by a set of statistical properties. 

For each query q in the workload and for each table r accessed by q the 
input provides the average number n r of rows from table r that is retrieved 
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from or written to storage by query q. Together with the (average) width w a of 
each attribute a from table r this generally gives a good estimate for how much 
attribute a costs in retrievals/writes by access methods for each evaluation of 
query q, namely W a q — w a ■ n r . 

Given a set of sites, the challenge is now to find a non-disjoint distribution 
of all attributes, and a disjoint distribution of transactions to these sites so 
that the costs of retrievals, writes and inter-site transfers, each defined in terms 
of W a q as explained in details below, is minimized. This means, that the 
primary executing site of any given query is assumed to be the site that hosts 
the transaction holding that query. 

As mentioned above, our algorithms will not break single-sitedness for read 
queries and therefore no additional costs are added to the execution of read 
queries by applying this algorithm. In contrast, since the storage costs (the sum 
of retrieval, write and inter-site transfer costs) for a query is minimized and each 
tuple become as narrow as possible, the total costs of evaluating the queries (e.g. 
processing joins, handling intermediate subresults, etc.) are assumed to be, if 
not minimized, then reduced too. 

1.2 Contributions 

This paper contributes with the following: 

• an algorithm optimized for H-store like architectures, preserving single- 
sitedness for read queries and in which load balance among sites versus 
minimization of total costs can be prioritized arbitrarily, 

• a more scalable heuristic, and 

• a micro benchmark of a) both algorithms based on TPC-C and a set of 
random instances, b) a comparison between the benefits of local versus 
remote partition location, and c) a comparison between disjoint and non- 
disjoint partitioning. 

1.3 Related work 

A lot of work has been done on data allocation and vertical partitioning but 
to the best of our knowledge, no work solves the exact same problem as the 
present paper: distributing both transactions and attributes to a set of sites, 
allowing attribute replication, preserving single-sitedness for read queries and 
prioritizing load balancing vs. total cost minimization. We therefore order the 
references below by increasing estimated problem similarity and do not mention 
work dedicated on vertical partitioning of OLAP databases. 

In ? ? ] reduced the cost of information retrieval by vertically partitioning 
records into a primary and a secondary record segment. This was done by 
constructing a bi-partite graph with two node sets: one set with a node for each 
attribute and one set with a node for each transaction. By connecting attribute 
and transaction nodes with a weighted edge according to their affinity, a min-cut 
algorithm could be applied to construct the partitioning. 

? ] assumed a set of horizontal and vertical fragments of a database was 
known in advance and produced a disjoint distribution of these fragments onto a 
set of network-connected processors using a greedy first-fit bin packing heuristic. 
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Similarly, ? ] distributed a set of predefined fragments to a set of sites, but used 
a linearized quadratic program to compute the solution. 

? ] took as input a geographically distributed database together with statis- 
tics for a query pattern on this database and produced as output a non-disjoint 
distribution of whole database tables to the physical sites so that the total 
amount of transfer was minimized. They modelled the problem as a linearized 
quadratic program which was solved in practice using heuristics. The costs of 
joins were minimized by first transferring join keys and then transferring the 
relevant attributes for the relevant rows to a single collector site. 

? ] constructed a disjoint partitioning with non-remote partition placement. 
They used an attribute affinity matrix to represent a complete weighted graph 
and generated partitions by finding a linearly connected spanning tree and con- 
sidering a cycle MkS (X fragment. 

? ] generated a non-remote, disjoint partitioning minimizing the amount 
of disk access by recursively applying a binary partitioning. The partitioning 
decisions were based on an integer program and with strong assumptions on a 
System-R like architecture when estimating the amount of disk access. 

? ] also constructed a disjoint partitioning with non-remote partition place- 
ment. They used a two-phase strategy where the first phase generated all rel- 
evant attribute groups using association rules [? ] considering only one query 
at a time, and the second phase merged the attribute groups that were useful 
across queries. 

? ] presented an algorithm for generating disjoint partitioning by either 
minimizing costs or by ensuring that exactly k vertical fragments were produced. 
Inter-site transfer costs were not considered. The partitioning was produced 
using a bottom-up strategy, iteratively merging two selected partitions with 
the best "merge profit" until only one large super-partition existed. The k- 
way partitioning was found at the iteration having exactly k partitions and the 
lowest-cost partitioning was found at the iteration with the lowest cost. 

? ] minimized the amount of disk access by constructing a non-remote and 
non-disjoint vertical partitioning. Two binary partitioning algorithms based 
on the branch-and-bound method were presented with varying complexity and 
accuracy. The partitionings were formed by recursively applying the binary 
partitioning algorithms on the set of "reasonable cuts" . 

? ] did not present an algorithm but gave an interesting objective function 
for evaluating vertical partitionings. The function was based on the square-error 
criterion as given in [? ] for data clustering, but did not cover placement of 
transactions which, in our case, has a large influence on the expected costs. 

? ] considered the vertical partitioning problem for three different environ- 
ments: a) single site with one memory level, b) single site with several memory 
levels, and c) multiple sites. The partitions could be both disjoint and non- 
disjoint. A clustering algorithm grouped attributes with high affinity by using 
an attribute affinity matrix together with a bond energy algorithm [? ]. Three 
basic algorithms for generating partitions were presented which, depending on 
the desired environment, used different prioritization of four access and transfer 
cost classes. 
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1.4 Outline of paper 

In section [5] we derive a cost model together with a quadratic program defining 
the first algorithm. Section [3] describes a heuristic based on the cost model 
found in Section [2l and Section [4] discusses a couple of ideas for improvements. 
Computational results are shown in Section [5] 

2 A linearized QP approach 

In this section we develop our base model, a quadratic program (QP), which 
will later be extended to handle load balancing and then linearized in order to 
solve it using a conventional mixed integer program (MIP) solver. 

2.1 The base model 

In a vertical partitioning for a schema and a workload we would like to minimize 
the sum 

A + P B (1) 

where A is the amount of data accessed locally in the storage layer, B is the 
amount of data needed to be transferred over the network during query updates 
and p is a penalty factor. 

We assume that each transaction has a primary executing site. For each 
transaction t G T, each table attribute a £ A, and each site s € S consider 
two decision variables x t . s £ {0,1} and y a , s £ {0,1} indicating if transaction 
t is executed on site s and if attribute a is located on site s, respectively. All 
transactions must be located at exactly one site (their primary executing site), 
that is 

X>m = 1 ,VseS (2) 
ter 

and all attributes must be located at at least one site, that is 

Va,s > 1 , Vs e S. 

To determine the size of A and B from equation ([T]) introduce five new static 
binary constants describing the database schema: 

• ct a ^q indicates if attribute a itself is accessed by query q 

• fia^q indicates if attribute a is part of a table that q accesses 

• indicates if query q is used in transaction t 

• 8 q indicates if query q is a write query 

• (p ai t indicates if any query in transaction t reads attribute a 

Single-sitedness should be maintained for reads. That is, if a read query in 
transaction t accesses attribute a then a and t must be codocated: 

Xt,afa,t = 1=> Ua,s = 1 , Vf £ T, a £ A 
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or equivalently 



Ua.s 



Xt, B <Pa,t > > Vi € T, a e A. 



In order to estimate the cost of reading, writing and transferring data, in- 
troduce the following weights: 

• w a denotes the average width of attribute a 

• f q denotes the frequency of query q 

• rt a , g denotes for query q the average number of rows retrieved from or 
written to the table holding attribute a 

Then the cost of reading or writing a in query q is estimated to W a , q — w a - f q -n a ^ q 
and the cost of transferring attribute a over the network is estimated to pW a , q - 
Notice, that W a ^ q is only an estimate due to f q and n a ^ q . 

Consider the amount of local data access, A, and let A = A^ + Aw where 
Ar and Ay/ is the amount of read and write access, respectively. For a given 
site r and query q, Ar is the sum of all attribute weights W a . q for which 1) q 
is a read query, 2) attribute a is stored on r, 3) the transaction that executes 
query q is executed on r and 4) q accesses any attribute in the table fraction 
that holds a. As we maintain single-sitedness for reads, (3 a , q can be used to 
handle 4), resulting in 



Accounting for local access of write queries, Aw, is less trivial. Consider the 
following three approaches: 

Access relevant attributes An attribute a at site s should be accounted for 
if and only if there exists an attribute a' on s that q updates so that a 
and a' are attributes of the same table. While this accounting is the most 
accurate of the three it is also the most expensive as it implies an element 
of the form y a . s ya'.s in the objective function which adds an undesirable 
amount of |A| 2 |iS| variables and 3|A| 2 |iS| constraints to the problem when 
linearized (see Section l2~3l) . 

Access all attributes We can get around the increased complexity by assum- 
ing that write queries q always writes to all sites containing table fractions 
of tables accessed by q, regardless of whether q actually accesses any of 
the attributes of the fractions. While this is correct for insert statements 
(assuming that inserts always write complete rows) it is likely an overes- 
timation for updates: imagine a lot of single-attribute updates on a wide 
table where the above method would have split the attribute in question 
to a separate partition. This overestimation will imply that the model will 
partition tables that are updated often or replicate attributes less often 
than the accounting model described above. 

Access no attributes Another approach to simplify the cost function is to 
completely avoid accounting for local access for writes and solely let the 
network transfer define the write costs. With this underestimation of write 
costs, attributes will then tend to be replicated more often than in the first 
accounting model. 
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In this paper we choose the second approach, which gives a conservative overes- 
timate of the write costs as we then obtain more accurate costs for inserts and 
avoid extending the model with undesirably many variables and constraints. 
Intuitively speaking, this choice implies that read queries will tend to partition 
the tables for best possible read-performance, and the write queries will tend to 
minimize the amount of attribute replication. We now have 

and thus 

A = X! W a-,ql3a,qjq,t(l ~ Sq)xt, S ya,s + ^Wa.qPa.qSql/a^. (3) 

B accounts for the amount of network transfer and since we enforce single- 
sitedness for all reads B is solely the sum of transfer costs for write queries. We 
assume that write queries only transfer the attributes they update and does not 
transfer to the site that holds their own transaction: 

B = ^ W a,qaa,qlq,tSq(l - X t .s)ya,s- 

a,t,s,q 

By noticing that J2 a ,t,s, q Ua,qlq,ty a ,s = Y^a.,s, q a a, q Va.s we can construct the 
minimization problem as 

S.t. J2s X t,s =1 w 

EsVa.s >1 Va (4) 

y a ,s - x t ,gip a ,t >0 Va,£ 
Xt,s,y a ,s e {0, 1} Vt,a,s 

where 

ci(a,£) = 2jWg, a 7 9 ,t(/3a,g(l ~ $q) -pa a: qSq) 

and 

C2(a) = W a ,q6q(l3 a ,q + pa a>q ). 

Both c\ and C2 are completely induced by the schema, query workload and 
statistics and can therefore be considered static when the partitioning process 
starts. 

2.2 Adding load balancing 

We are interested in extending the model in (j4]) to also handle load balancing 
of the sites instead of just minimizing the sum of all data access/transfer. From 
equation ^ define the work of a single site s E S as 

^c 3 (a,t)x t}S ya,s + ^c 4 (a)?/ a;S (5) 

a.t a 

where c 3 {a,t) = J2 q Wa,q1q,tPa,q(l - M and c 4 (a) = Y^ g w a,qPa,qS q . Introduce 
the variable m and for each site s let the value of ([5]) be a lower bound for m. 
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Adding m to the objective function is then equivalent to also minimizing the 
work of the maximally loaded site. 

In order to decide how to prioritize cost minimization versus load balancing 
in the model, introduce a scalar < A < 1 and weight the original cost from Q 
and m by A and (1 — A), respectively. The new objective is then 

A ^2 ci(a,t)x t , s y a ,s + A^c 2 (a)y a , s + (1 - A)m (6) 

a.t,s a,s 

where m is constrained as follows: 

22c 3 (a,t)x ttS ya, s + ^2ci(a)y a . s <m ,Vse«S. 

a,t a,<7 

Notice that while we are now minimizing ([5]), the objective of (j^]) should still 
be considered as the actual cost of a solution. 

2.3 Linerarization 

We use the technique discussed in [? , Chapter IV, Theorem 4] to linearize 
the model. This is done by replacing the quadratic terms in the model with a 
variable u t . a . s and adding the following new constraints: 

Ut,a,s < Xt,s Vt, a, s 

Ut,a,s<y a ,s Vt, a, s 

U t ,a,s > Xt,s + Va,s - 1 Vt, a, S 

For Ut.a.s > 0, notice that Ut.a.s = 1 if and only if xt. s = y a ,s — 1 and that Ut >a ,s 
is guaranteed to be binary if both Xt lS an d y a ,s are binary (thus, there is no 
need for requiring it explicitly in the model). 

Now, the model in @ extended with load balancing looks as follows when 
linearized: 



A Et, a , s c i(a^)wt,a. 




)y a ,s + (i - 


- A)m 






= i 


Vt 






Va,s 


> i 


Va 






y a .s - Xt,s<Pa,t 


> o 


Va, t 




Ea,t C 3(MK,M "+ 


J2a.,q C i( a )ya.s 


< m 


Vs 








< o 


Vt, a, 


s 




^t,a,s Ua,s 


< o 


Vt, a, 
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U*,a,s - 


■ x t , B - y a ,s + 1 
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Vt, a, 


s 






£{0,1} 


Vt, a, 


s 




Ut,a,s 


> o 


Vt, a, 


s 



(7) 



2.4 Complexity 

The objective function in quadratic programs can be written on the form 

^z T Qz + cz + d 

where in our case z = . . . , iE|t|,|S|> Vi,ii ■ ■ ■ i x \A\,\s\) i s a vector containing 

the decision variables, Q is a cost matrix, c is a cost vector and d a constant. 
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Q can be easily denned from © by dividing Q into four quadrants, letting the 
sub-matrices in the upper-left and lower-right quadrant equal zero and letting 
the upper-right and lower- left submatrices be defined by ci(a, t). Q is indefinite 
and the cost function ^ therefore not convex. As shown by ? ] finding optimum 
when Q is indefinite is NP-hard. 



3 The SA solver — a heuristic approach 

We develop a heuristic based on simulated annealing (see [? ]) and will refer 
to it as the SVl-solver from now on. The base idea is to alternately fix x and 
y and only optimize the not-fixed vector, thereby simplifying the problem. In 
each iteration we search in the neighborhood of the found solution and accept 
a worse solution as base for a further search with decreasing probability. 

Let Xt.s hold an assignment of transactions to sites and define the neighbor- 
hood x' of a; as a change of location for a subset of the transactions so that for 
each t £ T we still have J2 s x t.s = 1- Similarly, let y a , s hold an assignment of 
attributes to sites but define the neighborhood y' of y as an extended replication 
of a subset of the attributes. That is, for each a £ A in that subset we have 
y a ,s — 1 => y'a.s an d J2 s y'a,s > J2sVa,s- We found that altering the location 
for a constant number of 10% of both transactions/attributes yielded the best 
results. The heuristic now looks as pictured in Algorithm [1] Notice, that the 

Algorithm 1 The heuristic based on simulated annealing (SA). It iteratively 
fixes x and y and accepts a worse solution from the neighborhood with decreasing 
probability. 



1 

2 
3 
4 
5 
6 
7 
8 
9 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 



Initialize temperature r > and reduction factor p £ ]0; 1[ 

Set the number L of inner loops 

Initialize x randomly so that ([2]) is satisfied 

f ix «- "x" 

S <- findSolution(f ix) 
while not frozen do 
for i £ {1, . . . , L} do 

x <— neighborhood of x 

y <— neighborhood of y 

S' 4- findSolution(f ix) 

A <- cost(S") - cost(S) 

p <— a randomly chosen number in [0; 1] 

if A < or p < e~ A / T then 

S^S' 
end if 

f ix the element in \ {fix} 

end for 

T i — p • T 

end while 



linearization constraints is not needed since either x or y will be constant in 
each iteration. This reduces the size of the problem considerably. 
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4 Further improvements 



Consider a table with n attributes together with two queries: one accessing at- 
tribute 1 through k and one accessing attribute k through n. Then it is sufficient 
to find an optimal distribution for the three attribute groupings {1, . . . , fc — 1}, 
{k} and {k + 1, . . . , n}, considering each group as an atomic unit and thereby 
reducing the problem size. In general, it is only necessary to distribute groups 
of attributes induced by query access overlaps. ? ] refer to these attribute 
overlaps as reasonable cuts. Even though this will not improve the worst-case 
complexity, this reduction may still have a large performance impact on some 
instances. 

Also, assuming that transactions follow the 20/80 rule (20% of the transac- 
tions generate 80% of the load), the problem can be solved iteratively over T 
starting with a small set of the most heavy transactions. 

5 Computational results 

We assume that the context is a database with a very high transaction count 
like the memory-only database H-store [? ] (now VoltDELj) and thus need to 
compare RAM access versus network transfer time when deciding an appro- 
priate network penalty factor p. A PCI Express 2.0 bus transfers between 32 
Gbit/s and 128 Gbit/s while the bandwidth of PC3 DDR3-SDRAM is at least 
136 Gbit/s so the bus is the bottleneck in RAM accesses. We assume that 
the network is well configured and latency is minimal. Therefore the network 
penalty factor could be estimated to p G [3; 128] if either a gigabit or 10-gigabit 
network is used to connect the physical sites. We assume the use of a 10-gigabit 
network and therefore set p = 8 in our tests unless otherwise stated. 

Wc furthermore mainly focus on minimizing the total costs of execution and 
therefore set A low. If A is kept positive the model will, however, choose the 
more load balanced layout if there is a cost draw between multiple layouts. We 
set A = 0.1 in our tests unless otherwise stated. 

All tests were run on a MacBook Pro with a 2.4 GHz Intel Core 2 Duo 
and 4GB 1067 MhZ DDR3 RAM, running Mac OS X 10.5. The GNU Linear 
Programming 

KiE (GLPK) 4.39 was used as MIP solver, using only a single 

thread. 

The test implementation is available upon request. 
5.1 Initial temperature 

The temperature r used in the heuristic described in Section [3] determines how 
willing the algorithm is to accept a worse solution than the currently best found. 
Let C* and C denote the objective for the best solution so far and the currently 
generated solution, respectively. In the computational results provided here we 
accept a worse solution with 50% probability in the first set of iterations if 
< 5%. Referring to the notation used in Algorithm [TJ we have 50% = 
e o.05C*/r an( j tnug an initial temperature of r = -0.05C*/ ln0.5. 

ihttp:/ /voltdb.com 

2 http:/ /gnu.org/software/glpk 
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5.2 The TPC-C v5 instance 

We perform tests on the TPC-C version 5.10.1 benchmark The TPC-C specifi- 
cation describes transactions, queries and database schema but does not provide 
the statistics needed to create a problem instance. We therefore made some sim- 
plified assumptions: all queries are assumed to run with equal frequency and 
all queries (not transactions) are assumed to access a single row except in the 
obvious cases where aggregates are used or there are being iterated over the 
result. In these cases we assume that the query accesses 10 rows. Thereby, the 
New-Order transaction for example, are assumed to access 11 rows in average. 

We model UPDATE queries as two sub-queries: A read-query accessing all 
the attributes used in the original query and a write-query only accessing the 
attributes actually being written (and thus whose update needs to be distributed 
to all replicas). 

5.3 Random instances 

To the best of our knowledge there is no standard library of typical OLTP 
instances with schemas, workloads and statistics so in order to explore the char- 
acteristics of the algorithms we perform some experiments on a set of randomly 
generated instances instead as it showed up to be a considerable administrative 
and bureaucratic challenge (if possible at all) to collect appropriate instances 
from "real life" databases. The randomly generated instances vary in several 
parameters in order to clarify which characteristics that influence the potential 
cost reduction by applying our vertical partitioning algorithms. The parame- 
ters include: number of transactions in workload, number of tables in schema, 
maximum number of attributes per table, maximum number of queries per 
transaction, percentage of queries being updates, maximum number of different 
tables being referred to from a single query, maximum number of individual at- 
tributes being referred to by a single query, the set of allowed attribute widths. 
We define classes of problem instances by upper bounds on all parameters. In- 
dividual instances are then generated by choosing the value of each parameter 
evenly distributed between 1 and its upper bound. That is, if e.g. the maximum 
allowed number of attributes in tables is k, the number of table attributes for 
each table in the generated instance will be evenly distributed between 1 and k 
with a mean of k/2. 

5.4 Results 

In the following we perform a series of tests and display the results in tables 
where each entry holds the found objective of ((H) for the given instance. 

Table [I] explores the influence of a set of parameters in the randomly gener- 
ated instances by varying one parameter at a time while fixing the rest. We test 
two classes of instances using the SA solver: a smaller with ^tables = |T| = 
20 and a larger with ^tables = \T\ = 100. The results suggest that the largest 
workload reduction is obtained for instances having relatively few queries per 
transaction, few updates, many attributes per table and/or a moderate number 
of attribute references per query. The number of table references per query and 

3 http: / /www. tpc.org/tpcc 
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the allowed attribute widths, however, only seem to have moderate influence on 
the result. 





=fttahlps — ITI — 90 

\S\ = 1 \S\ = 2 \S\ = 3 


^tables — T1 — 100 

|.S| = 1 |.S| = 2 |5|=3 


A Max queries ^ 
per transaction 


0.585 0.309 0.278 
1.567 1.478 1.386 
1.305 1.054 0.972 


3.194 1.784 1.471 
5.743 4.550 4.189 
8.840 7.569 6.983 


B Percent jj 
updates queries ^ 


1.747 1.369 1.110 
1.567 1.478 1.386 
1.349 1.244 1.263* 


5.959 4.235 3.510 
5.743 4.550 4.189 
5.106 4.555 4.462 


5 

C Max attributes ^ 
per tabic ^ 


0.520 0.520* 0.520* 
1.567 1.478 1.386 
1.643 0.968 0.850 


2.583 2.772* 2.712* 
5.743 4.550 4.189 
14.970 7.341 5.355 


D Max table 2 
references per 5 
query 10 


0.602 0.430 0.356 
1.567 1.478 1.386 
2.246 1.607 1.516 


3.447 3.022 2.865 
5.743 4.550 4.189 
8.147 6.063 5.623 


E Max attribute 5 
references per 15 
query 25 


0.678 0.288 0.199 
1.567 1.478 1.386 
1.115 0.988 1.008* 


5.176 2.526 1.969 
5.743 4.550 4.189 
5.641 5.909* 5.684* 


F Allowed {2 {4'8j 
attribute widths , \ ' i 
{4, 8, lb) 

Table 1: Comparing the 
using the SA solver. We 


1.194 1.080 1.030 
1.567 1.478 1.386 
2.387 2.160 2.060 

effect of parameter changes, 
test three possible values 


4.456 3.488 3.500* 
5.743 4.550 4.189 
8.912 6.977 7.000 

Results were found 
"or each parameter, 



varying one parameter at the time and fixing all other parameters at their 
default value (marked with bold). The costs are shown in units of 10 6 . Tests 
are divided into two classes having both the number of transactions and 
schema tables equal to 20 (left) and 100 (right), respectively. The results 
suggest that the largest workload reduction, unsurprisingly, is obtained for 
instances having relatively few queries per transaction, few updates, many 
attributes per table and/or a moderate number of attribute references per 
query. The number of table references per query and the allowed attribute 
widths, however, only seem to have moderate influence on the result. 

Table compares the QP and SA solvers on the TPC-C benchmark and a 
set of randomly generated larger instances, divided into two classes with either 
large or low potential for cost reduction. The random instances are described 
in Table [2] where the columns here refer to the single-letter labels for the pa- 
rameters shown in Table [TJ As seen in Table |3] the SA solver is generally 
faster than the QP solver but the QP solver obtains lower costs when the in- 
stances are small. Expectedly, the instances in class "rndB. . . " with many 
attribute references per query but few queries per table gains little or no cost 
reduction by applying the algorithms. TPC-C, on the other hand, gets a cost 
reduction of 37% and the random instances in class "rndA. . . " , with many at- 
tributes per table and relatively few attribute references per query, get a cost 
reduction between 25% and 85%. None of the algorithms found a cost reduction 
for the instances rndAt4xl00 and rndAt8xl00 because of the "overweight" of 
transactions compared to the number of attributes in the schemas. 

Table [4] depicts an actual partitioning of TPC-C constructed by the QP 
solver for three sites. 

Table [S] illustrates the effect of disjoint versus nondisjoint partitioning, that 
is, partitioning without and with attribute replication. As seen, greater cost re- 
duction can be obtained when allowing replication but in exchange to increased 
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Name 


A 


B 


c 


D 


E 


F 


171 


#tables 


rndAt4xl5 


3 


10 


30 


3 


8 


{2,4,8, 16} 


15 


4 


rndAt8xl5 


3 


10 


30 


3 


8 


{2,4,8, 16} 


15 


8 


rndAt8xl5u50 


3 


50 


30 


3 


8 


{2,4,8, 16} 


15 


8 


rndAtl6xl5 


3 


10 


30 


3 


8 


{2,4,8, 16} 


15 


16 


rndAt32xl5 


3 


10 


30 


3 


8 


{2,4,8, 16} 


15 


32 


rndAt4xl00 


3 


10 


30 


3 


8 


{2,4,8, 16} 


100 


4 


rndAt8xl00 


3 


10 


30 


3 


8 


{2,4,8, 16} 


100 


8 


rndAtl6xl00 


3 


10 


30 


3 


8 


{2,4,8, 16} 


100 


16 


rndAt32xl00 


3 


10 


30 


3 


8 


{2,4,8, 16} 


100 


32 


rndBt4xl5 


3 


10 


5 


6 


28 


{2,4,8, 16} 


15 


4 


rndBt8xl5 


3 


10 


5 


6 


28 


{2,4,8, 16} 


15 


8 


rndBtl6xl5 


3 


10 


5 


6 


28 


{2,4,8, 16} 


15 


16 


rndBtl6xl5u50 


3 


50 


5 


6 


28 


{2,4,8, 16} 


15 


16 


rndBt32xl5 


3 


10 


5 


6 


28 


{2,4,8, 16} 


15 


32 


rndBt4xl00 


3 


10 


5 


6 


28 


{2,4,8, 16} 


100 


4 


rndBt8xl00 


3 


10 


5 


6 


28 


{2,4,8, 16} 


100 


8 


rndBtl6xl00 


3 


10 


5 


6 


28 


{2,4,8, 16} 


100 


16 


rndBt32xl00 


3 


10 


5 


6 


28 


{2,4,8, 16} 


100 


32 



Table 2: Random instances used when comparing the QP and SA solvers 
in Table [3] The instances in the upper part (rndA. . . ) are expected to 
get a large cost reduction while instances in the lower part (rndB. . . ) are 
expected to get a small cost reduction. The columns refer to the single- 
letter labels for the parameters shown in Table [1] 













QP 




SA 




Instance 


\A\ 


IT] 


\s\ 


Cost 


Time (s) 


Cost 


Time (s) 


\S\ = 1 


TPC-C v5 


92 


5 


2 


0.133 


1 


0.138 


5 


0.208 


TPC-C v5 


92 


5 


3 


0.132 


6 


0.132 


5 


0.208 


TPC-C v5 


92 


5 


4 


0.132 


33 


0.132 


5 


0.208 


rndAt4xl5 


54 


15 


4 


(0.332) 


1800 


0.396 


10 


0.933 


rndAt8xl5 


105 


15 


4 


(0.324) 


1800 


0.327 


18 


0.808 


rndAtl6xl5 


225 


15 


4 


(0.267) 


1800 


0.309 


41 


1.180 


rndAt32xl5 


492 


15 


4 


(0.315) 


1800 


0.217 


89 


1.491 


rndAt64xl5 


1023 


15 


4 


(0.269) 


1800 


0.268 


190 


1.452 


rndAt4xl00 


54 


100 


4 


(8.001) 


1800 


8.246 


79 


7.946 


rndAt8xl00 


105 


100 


4 


(7.681) 


1800 


8.018 


150 


7.454 


rndAtl6xl00 


225 


100 


4 




t/o 


6.525 


321 


8.741 


rndAt32xl00 


492 


100 


4 




t/o 


4.501 


728 


8.916 


rndAt64xl00 


1023 


100 


4 




t/o 


4.119 


1531 


9.591 


rndBt4xl5 


12 


15 


4 


0.303 


65 


0.303 


3 


0.303 


rndBt8xl5 


27 


15 


4 


(0.448) 


1800 


0.424 


6 


0.440 


rndBtl6xl5 


49 


15 


4 


(0.333) 


1800 


0.334 


9 


0.385 


rndBt32xl5 


98 


15 


4 


(0.319) 


1800 


0.319 


16 


0.361 


rndBt64xl5 


210 


15 


4 


(0.221) 


1800 


0.221 


31 


0.229 


rndBt4xl00 


54 


100 


4 


(4.484) 


1800 


2.251 


18 


2.251 


rndBt8xl00 


105 


100 


4 


(4.323) 


1800 


2.419 


37 


2.419 


rndBtl6xl00 


225 


100 


4 


(2.001) 


1800 


1.774 


62 


1.774 


rndBt32xl00 


492 


100 


4 


(2.419) 


1800 


1.999 


124 


1.999 


rndBt64xl00 


1023 


100 


4 




1800 


2.473 


270 


2.473 



Table 3: Comparing the QP algorithm with the simulated annealing based 
heuristic (SA), allowing attribute replication and with remote partition 
placement. Costs are shown in units of 10 6 . The SA algorithm had a 30 
second time limit for each iteration and if the limit was reached it pro- 
ceeded with another neighborhood. The QP algorithm had a time bound 
of 30 minutes and an MIP tolerance gap of 0.1%. Where the time limit 
was reached, the best found cost (if any) is written in parentheses, "t/o" 
indicates that no integer solution was found within the time limit. 
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Site 1 



Site 2 



Site 3 



| Trim.-- act i < m /Vc;;m/:' 



Custom 


cr.CBALANCE 


Custom 


er.C_.CITY 


Custom 


er.C.CRBDIT 


Custom 


er.C_CREDIT_LIM 


Custom 


cr.CDATA 


Custom 


cr.CDISCOUNT 


Custom 


cr.C_D_ID 


Custom 


cr.C.FIRST 


Custom 


cr.C_ID 


Custom 


cr.CLAST 


Custom 


er.CMIDDLE 


Custom 


er.CPHONE 


Custom 


er.C.SINCE 


Custom 


er.CSTATE 


Custom 


er.C.STREET.l 


Custom 


er.C_STREET_2 


Custom 


cr.C_W_ID 


Custom 


cr.C_.ZIP 


District 


.D.CITY 


District 


.D_ID 


District 


.D.NAME 


District 


.D.STATE 


District 


.D.STREET.1 


District 


.D.STREET.2 


District 


.D_W_ID 


District 


.D.YTD 


District 


.D_ZIP 


History 


H.AMOUNT 


History 


H_C_D_ID 


History 


H_C_ID 


History 


H_C_W_ID 


History 


H.DATA 


History 


H.DATE 


History. H_D_ID 


History.H_W_ID 


OrdcrL 


nc.OL.DIST_INFO 


OrdcrL 


ne.OL_N UMBER 


Stock-S 


.ORDER.CNT 


Stock-S 


.REMOTE.CNT 


Stock-S 


_YTD 


Wareho 


usc.W.CITY 


Ware ho 


usc.W_ID 


Ware ho 


use.W.NAME 


Warcho 


use.W_STREET_l 


Warehouse. W.STREET.2 


Warehouse. W.YTD 


Warehouse. W_ZIP 



| Transacl i» m / L-< '■, i 

Customer. C_CITY 

Customer. C.DELIVERY.CNT 

Customer. C.PAYMENT.CNT 

Customer. C.SINCE 

Customer. C.YTD.PAYMENT 

District. D_ID 

District. D.NEXT.O.ID 

District. D_W_ID 

Item.I_IM_ID 
OrdcrLinc.OL_D_ID 
OrderLine.OL_I_ID 
OrderLine. OL.O.ID 

OrdcrLinc.OL_W_ID 

Stock. S_I_ID 

Stock. S.QUANTITY 

Stock. S_W_ID 



Transaction Delivery 
Transaction NewOrder 
Transaction OrderStatus 

Customer. C_B ALAN CE 
Customer. C.CREDIT 
Customer. C.DISCOUNT 
Customer. C_D_ID 
Customer. C.FIRST 
Customer. C_ID 
Customer. C_LAST 
Customer. C_M ID DLE 

Customer. C_W_ID 

District. D_ID 

District. D.NEXT.O.ID 

District. D.TAX 

District.D_W_ID 

Itcm.LDATA 
Item.IJD 
Item.IJSJAME 
Item. I_P RICE 
NewOrder. NO_D_ID 
NewOrder.NO.OJD 

NewOrder. NO.W.ID 

Order. O.ALL.LOCAL 
Order. O.CARRIER.ID 
Order. 0_C_ID 
Order. 0_D_ID 
Order. O.ENTRY.D 
Order. O.ID 
Order. O.OL.CNT 

Order. Q_W_ID 

OrderLine. OL_A MOUNT 
OrderLine. OL.DELI VERY.D 
OrderLine. OL_D_ID 
OrderLine. OL_I_ID 
OrderLine. OL_0_ID 
OrderLine. OL.QUANTITY 
OrderLine. OL.SUPPLY.W.ID 

OrderLine. OL.W.ID 

Stock. S.DATA 
Stock. S_DIST_01 
Stock. S_DIST_02 
Stock. S_DIST_03 
Stock. S_DIST_04 
Stock. S_DIST_05 
Stock. S_DIST_06 
Stock. S_DIST_07 
Stock. S_DIST_08 
Stock. S_DIST_09 
Stock. S_DIST_10 
Stock. S_I_ID 
Stock. S.QUANTITY 

Stock.S_W_ID 

Warehouse. W_ID 
Warehouse. W.TAX 



Table 4: The result of a vertical partitioning of the TPC-C benchmark 
using the QP solver for three sites. Each column represents the contents 
of a site and is divided into three sub-sections: a header, a section holding 
the transaction names and a longer section holding the attributes assigned 
to the respective site. 
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computation time. 



Instance 


\A\ 


\r\ 


\s\ 


w. replication 
Cost Time (s) 


w/o replication 
Cost Time (s) 


Ratio 


TPC-C v5 


92 


5 


1 


0.208 





0.208 







TPC-C v5 


92 


5 


2 


0.133 


1 


0.207 


1 


64% 


TPC-C v5 


92 


5 


3 


0.132 


6 


0.207 


2 


64% 


TPC-C v5 


92 


5 


4 


0.132 


33 


0.207 


3 


64% 


rndAt4xl5 


54 


15 


2 


4.855 


28 


6.799 


1 


71% 


rndAt8xl5 


105 


15 


2 


4.710 


517 


5.809 


6 


81% 


rndAt8xl5 


27 


15 


2 


4.244 


4 


4.402 





96% 


rndAt!6x!5 


49 


15 


2 


3.410 


34 


3.852 





89% 



Table 5: Computational results from solving the TPC-C benchmark and a 
few random instances with the QP solver. Costs are shown in units of 10 5 . 
The table shows that costs can be reduced by allowing attribute replication 
and that TPC-C does not benefit noticeably from being partitioned and 
distributed to more than two sites. The Ratio column displays the ratio 
between the replicated and non-replicated cost. 

Table[5]compares two different kinds of partition placements: 1) all partitions 
being located at one single site (thereby avoiding inter-site transfers) and 2) 
partitions being located at remote sites. These two situations can be simulated 
by setting p — and p > 0, respectively. The benefits of local placements are 
given by the amount of updates in the workload as only updates cause inter- 
site transfers. More updates implies larger costs for remote placements. For 
a somewhat extreme case, instance "rndAt8xl5u50" , with 50% of the queries 
being updates, the costs are about 33% lower when placing the partitions locally. 



Instance 


1-41 


171 


\s\ 


Local 

Cost (QP) Cost (SA) 


Remote 
Cost (QP) Cost (SA) 


TPC-C v5 


92 


5 


1 


1.916 


1.916 


1.916 


1.916 


TPC-C v5 


92 


5 


2 


1.210 


1.208 


1.221 


1.273 


TPC-C v5 


92 


5 


3 


1.208 


1.208 


1.220 


1.220 


rndAt4xl5 


54 


15 


2 


4.709 


4.742 


4.855 


4.888 


rndAt8xl5 


105 


15 


2 


4.424 


4.808 


4.710 


5.187 


rndAt8xl5u50 


105 


20 


2 


3.189 


3.313 


4.778 


4.873 


rndBt8xl5 


27 


15 


2 


4.365 


4.332 


4.244 


4.730 


rndBtl6xl5 


49 


15 


2 


3.335 


3.387 


3.410 


3.404 


rndBtl6xl5u50 


49 


20 


2 


5.066 


5.220 


5.438 


5.438 



Table 6: Comparing the costs of local (p — 0) versus remote (p > 0) 
location of partitions and with attribute replication allowed. Costs are in 
units of 10°. Write-rarely instances or instances in class "rndB. . . " do not 
benefit noticeably by placing all partitions locally, even the instances with 
50% update queries, however instances in class "rndA. . . " with a large 
update ratio do. The reason is that only updates cause inter-site transfer. 
That the costs of the local placement for rndBt8xl5 is larger than when 
placed remotely is since A > 0. 
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6 Conclusion 



We have constructed a cost model for vertical partitioning of relational OLTP 
databases together with a quadratic integer program that distributes both at- 
tributes and transactions to a set of sites while allowing attribute replication, 
preserving single-sitedness for read queries and in which load balancing vs. total 
cost minimization can be prioritized arbitrarily. 

We also presented a more scalable heuristic which seems to deliver good 
results. For both algorithms we obtained a cost reduction of 37% in our model 
of TPC-C and promising results for the random instances. Even though the 
latter theoretically can be constructed with arbitrary high/low benefits from 
vertical partitioning, the test runs on our selected subset of random instances 
seem to indicate that 1) our heuristic scales far better than the QP-solver, and 
2) it can obtain valuable cost reductions on many real-world OLTP databases, 
as we tried to select the parameters realistically. 

One thing we miss, however, is an official OLTP testbed - a library con- 
taining realistic OLTP workloads, schemas and statistics. Such a collection of 
realistic instances could serve as base for several insteresting and important 
studies for understanding the nature and characteristics of OLTP databases. 
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A Latency 



This section describes how to extend the algorithms to also estimate costs of 
network latency for queries accessing attributes on remote sites. We assume, 
that all remote access (if any) for queries are done in parallel and with a constant 
number of requests per query per remote site. Let p\ denote a latency penalty 
factor and introduce a new binary variable ip q for each query q indicating with 
ijjq = 1 if q accesses any remotely placed attributes. Letting n denote the 
number of remotely accessed attributes by q we have n > =£- ip q = 1 and 
n = ip q = 0, or equivalently (ip q — l)n = and ipq — n < 0. This results in 
the following two classes of new constraints: 

(ip q - l)^5 ? a Q ,g7g,t(l - x t , s )y a ,s = ,Vg,t 

a,s 

and 

tp q - ^S q a a ^j qit {l - x t , s )y a ,s < ,Vg,t 

a,s 

The total latency in a given partitioning can now be estimated by the sum 
pi J2 q fq^Pq which can be added to the cost objective function Qj. 
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